sparse reward
M-Walk: Learning to Walk over Graphs using Monte Carlo Tree Search
Learning to walk over a graph towards a target node for a given query and a source node is an important problem in applications such as knowledge base completion (KBC). It can be formulated as a reinforcement learning (RL) problem with a known state transition model. To overcome the challenge of sparse rewards, we develop a graph-walking agent called M-Walk, which consists of a deep recurrent neural network (RNN) and Monte Carlo Tree Search (MCTS). The RNN encodes the state (i.e., history of the walked path) and maps it separately to a policy and Q-values. In order to effectively train the agent from sparse rewards, we combine MCTS with the neural policy to generate trajectories yielding more positive rewards.
- Asia > Middle East > Jordan (0.04)
- Asia > China > Anhui Province > Hefei (0.04)
- Europe > Sweden > Skåne County > Malmö (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Education (0.93)
- Leisure & Entertainment > Games > Computer Games (0.46)
b6846b0186a035fcc76b1b1d26fd42fa-Supplemental.pdf
We compared RAPS with the latest state-of-the-art work that incorporates DMPs with Deep RL: Neural Dynamic Policies [6]. One question that may arise is: How useful isthe dummy primitive? We runanexperiment with and without thedummy primitiveinorder toevaluate itsimpact, and find that the dummy primitive improves performance significantly. Each image depicts the solution of one of the tasks, we omit the bottom burner task as it is the goal is the same as the top burner task, just with a different dial to turn. For the sequential multi-task version of the environment, in a single episode, the goal is to complete four different subtasks.